智能论文笔记

BON: An extended public domain dataset for human activity recognition

Girmaw Abebe Tadesse , Oliver Bent , Komminist Weldemariam , Md. Abrar Istiak , Taufiq Hasan , Andrea Cavallaro

分类：计算机视觉

2022-09-12

人体戴的第一人称视觉（FPV）摄像头使从受试者的角度提取有关环境的丰富信息来源。然而，与其他活动环境（例如厨房和室外卧床）相比，基于可穿戴摄像头的eg中心办公室活动的研究进展速度很慢，这主要是由于缺乏足够的数据集来培训更复杂的（例如，深度学习）模型的模型在办公环境中的人类活动识别。本文提供了使用胸部安装的GoPro Hero摄像机，提供了三个地理位置的不同办公室设置中收集的大型公开办公活动数据集（BON）：巴塞罗那（西班牙），牛津（英国）和内罗毕（肯尼亚）。 BON数据集包含十八个常见的办公活动，可以将其分为人与人之间的互动（例如与同事聊天），人对象（例如，在白板上写作）和本体感受（例如，步行）。为5秒钟的视频段提供注释。通常，BON包含25个受试者和2639个分段。为了促进子域中的进一步研究，我们还提供了可以用作未来研究基准的结果。

translated by 谷歌翻译

Sparsity-based Feature Selection for Anomalous Subgroup Discovery

Girmaw Abebe Tadesse , William Ogallo , Catherine Wanjiru , Charles Wachira , Isaiah Onando Mulang' , Vibha Anand , Aisha Walcott-Bryant , Skyler Speakman

分类：机器学习 | 人工智能

2022-01-06

异常模式检测旨在识别与正常偏差明显的情况，并且广泛适用于域。在现有技术中提出了多种异常的检测技术。但是，有一个常见的原则和可扩展的特征选择方法，以便有效发现。通常通过优化预测结果的性能而不是与预期的系统偏差来实现现有的特征选择技术。在本文中，我们提出了一种基于稀疏的自动特征选择（SAFS）框架，其通过特征驱动的大量比率的稀疏性编码系统的结果偏差。 SAF是一种模型 - 无可争议的方法，具有不同发现技术的可用性。 SAF在可在公开的关键护理数据集上验证时维持检测性能超过3倍，计算时间超过3美元。与特征选择的多个基线相比，SAF也会导致卓越的性能。

translated by 谷歌翻译

Automated Supervised Feature Selection for Differentiated Patterns of Care

Catherine Wanjiru , William Ogallo , Girmaw Abebe Tadesse , Charles Wachira , Isaiah Onando Mulang' , Aisha Walcott-Bryant

分类：机器学习 | 人工智能

2021-11-05

使用多种最先进的特征选择技术开发了自动特征选择管道，以选择用于区分护理模式（DPOC）的最佳功能。管道包括三种类型的特征选择技术;过滤器，包装器和嵌入式方法选择顶部K功能。使用具有二进制依赖变量的五种不同的数据集，选择了它们的不同顶部K最佳功能。在现有的多维子集扫描（MDS）中测试了所选特征，其中记录了最异常的亚步骤，大多数异常子集，倾向分数和测量的效果以测试它们的性能。将这种性能与在MDSS管道中数据集中的所有协变量中获得的四个类似的指标进行了比较。我们发现，尽管使用了不同的特征选择技术，但数据分布是在确定要使用的技术时注意的键。

translated by 谷歌翻译

Addressing Data Heterogeneity in Decentralized Learning via Topological Pre-processing

Waqwoya Abebe , Ali Jannesari

分类：机器学习

2022-12-16

Recently, local peer topology has been shown to influence the overall convergence of decentralized learning (DL) graphs in the presence of data heterogeneity. In this paper, we demonstrate the advantages of constructing a proxy-based locally heterogeneous DL topology to enhance convergence and maintain data privacy. In particular, we propose a novel peer clumping strategy to efficiently cluster peers before arranging them in a final training graph. By showing how locally heterogeneous graphs outperform locally homogeneous graphs of similar size and from the same global data distribution, we present a strong case for topological pre-processing. Moreover, we demonstrate the scalability of our approach by showing how the proposed topological pre-processing overhead remains small in large graphs while the performance gains get even more pronounced. Furthermore, we show the robustness of our approach in the presence of network partitions.

translated by 谷歌翻译

TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data

Beichen Zhang , Frank Schilder , Kelly Helm Smith , Michael J. Hayes , Sherri Harms , Tsegaye Tadesse

分类：自然语言处理 | 机器学习

2022-12-07

Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.

translated by 谷歌翻译

Resource-Aware Heterogeneous Federated Learning using Neural Architecture Search

Sixing Yu , Phuong Nguyen , Waqwoya Abebe , Justin Stanley , Pablo Munoz , Ali Jannesari

分类：机器学习 | 计算机视觉

2022-11-09

Federated Learning (FL) is extensively used to train AI/ML models in distributed and privacy-preserving settings. Participant edge devices in FL systems typically contain non-independent and identically distributed~(Non-IID) private data and unevenly distributed computational resources. Preserving user data privacy while optimizing AI/ML models in a heterogeneous federated network requires us to address data heterogeneity and system/resource heterogeneity. Hence, we propose \underline{R}esource-\underline{a}ware \underline{F}ederated \underline{L}earning~(RaFL) to address these challenges. RaFL allocates resource-aware models to edge devices using Neural Architecture Search~(NAS) and allows heterogeneous model architecture deployment by knowledge extraction and fusion. Integrating NAS into FL enables on-demand customized model deployment for resource-diverse edge devices. Furthermore, we propose a multi-model architecture fusion scheme allowing the aggregation of the distributed learning results. Results demonstrate RaFL's superior resource efficiency compared to SoTA.

translated by 谷歌翻译

Quantitative Assessment of Drought Impacts Using XGBoost based on the Drought Impact Reporter

Beichen Zhang , Fatima K. Abu Salem , Michael J. Hayes , Tsegaye Tadesse

分类：机器学习

2022-11-04

Under climate change, the increasing frequency, intensity, and spatial extent of drought events lead to higher socio-economic costs. However, the relationships between the hydro-meteorological indicators and drought impacts are not identified well yet because of the complexity and data scarcity. In this paper, we proposed a framework based on the extreme gradient model (XGBoost) for Texas to predict multi-category drought impacts and connected a typical drought indicator, Standardized Precipitation Index (SPI), to the text-based impacts from the Drought Impact Reporter (DIR). The preliminary results of this study showed an outstanding performance of the well-trained models to assess drought impacts on agriculture, fire, society & public health, plants & wildlife, as well as relief, response & restrictions in Texas. It also provided a possibility to appraise drought impacts using hydro-meteorological indicators with the proposed framework in the United States, which could help drought risk management by giving additional information and improving the updating frequency of drought impacts. Our interpretation results using the Shapley additive explanation (SHAP) interpretability technique revealed that the rules guiding the predictions of XGBoost comply with domain expertise knowledge around the role that SPI indicators play around drought impacts.

translated by 谷歌翻译

Lost in Translation: Reimagining the Machine Learning Life Cycle in Education

Lydia T. Liu , Serena Wang , Tolani Britton , Rediet Abebe

分类：人工智能

2022-09-08

机器学习（ML）技术在教育方面越来越普遍，从预测学生辍学，到协助大学入学以及促进MOOC的兴起。考虑到这些新颖用途的快速增长，迫切需要调查ML技术如何支持长期以来的教育原则和目标。在这项工作中，我们阐明了这一复杂的景观绘制，以对教育专家的访谈进行定性见解。这些访谈包括对过去十年中著名应用ML会议上发表的ML教育（ML4ED）论文的深入评估。我们的中心研究目标是批判性地研究这些论文的陈述或暗示教育和社会目标如何与他们解决的ML问题保持一致。也就是说，技术问题的提出，目标，方法和解释结果与手头的教育问题保持一致。我们发现，在ML生命周期的两个部分中存在跨学科的差距，并且尤其突出：从教育目标和将预测转换为干预措施的ML问题的提出。我们使用这些见解来提出扩展的ML生命周期，这也可能适用于在其他领域中使用ML。我们的工作加入了越来越多的跨教育和ML研究的荟萃分析研究，以及对ML社会影响的批判性分析。具体而言，它填补了对机器学习的主要技术理解与与学生合作和政策合作的教育研究人员的观点之间的差距。

translated by 谷歌翻译

Adversarial Scrutiny of Evidentiary Statistical Software

Rediet Abebe , Moritz Hardt , Angela Jin , John Miller , Ludwig Schmidt , Rebecca Wexler

分类：机器学习

2022-06-19

美国刑事法律体系越来越依赖软件输出来定罪和被监禁。在每年大量案件中，政府根据统计软件的证据（例如概率基因分型，环境音频检测和工具标志分析工具）做出这些结果决定，以使辩护律师无法完全盘中或审查。这破坏了对抗性刑事法律制度的承诺，该制度依赖于辩方探查和测试起诉案件保护个人权利的能力。为了应对这种软件的对抗性审查输出的需求，我们提出了强大的对抗测试作为审计框架，以检查证据统计软件的有效性。我们通过在强大的机器学习和算法公平的最新作品中绘制大量工作来定义和操作这种强大的对抗性测试的概念。我们演示了该框架如何使审查此类工具的过程标准化，并使辩护律师能够检查其与当前案件最相关的情况的有效性。我们进一步讨论了美国刑事法律制度内的现有结构和机构挑战，该系统可能会造成实施该和其他此类审计框架的障碍，并通过讨论政策变更的讨论可以帮助解决这些问题。

translated by 谷歌翻译

Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems

Minsig Han , Ameha T. Abebe , Chung G. Kang

分类：机器学习

2022-05-22

在免赠款稀疏代码多访问（GF-SCMA）系统中，主动用户检测（AUD）是一个主要的性能瓶颈，因为它涉及复杂的组合问题，这使用户和接收器的争夺资源的联合设计是至关重要的，但是一个具有挑战性的问题。为此，我们建议对编码器侧的两个序列生成网络（PGN）和解码器端的数据辅助AUD进行基于自动编码器（AE）的关节优化。提出的AE的核心体系结构是解码器中新型的用户活动提取网络（UAEN），该网络从SCMA CodeWord数据中提取先验用户活动信息，以获取数据辅助AUD。对拟议的AE进行的端到端培训可以使争夺资源的联合优化，即序列序列，每个序列，每个序列与其中一本代码书关联，并从序言和基于SCMA的数据传输中提取用户活动信息。此外，我们在端到端培训之前为UAEN提出了一个自制的预训练计划，以确保AE网络内部深处的UAEN的收敛性。仿真结果表明，与基于最先进的DL的AUD方案相比。

translated by 谷歌翻译